Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
نویسندگان
چکیده
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity. Keywords—Clustering, categorical data, K-Modes, weighted dissimilarity measure
منابع مشابه
A dissimilarity measure for the k-Modes clustering algorithm
Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently. As the extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes...
متن کاملA fuzzy k-modes algorithm for clustering categorical data
This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of means for clusters, a new approach is developed, which allows the use of the k-means paradigm to efficiently cluster large categorical data sets. A fuzzy k-modes algorithm is presented and the effec...
متن کاملA Global-Relationship Dissimilarity Measure for the k-Modes Clustering Algorithm
The k-modes clustering algorithm has been widely used to cluster categorical data. In this paper, we firstly analyzed the k-modes algorithm and its dissimilarity measure. Based on this, we then proposed a novel dissimilarity measure, which is named as GRD. GRD considers not only the relationships between the object and all cluster modes but also the differences of different attributes. Finally ...
متن کاملGenetic Distance Measure for K-modes Algorithm
K-means algorithm has been shown to be an effective and efficient algorithm for clustering. However, the k-means algorithm is developed for numerical data only. It is not suitable for the clustering of non-numerical data. K-modes algorithm has been developed for clustering categorical objects by extending from the k-means algorithm. However, no one applies this technique for classification of c...
متن کاملImproving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode
The original k-means algorithm is designed to work primarily on numeric data sets. This prohibits the algorithm from being applied to categorical data clustering, which is an integral part of data mining and has attracted much attention recently. The k-modes algorithm extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus t...
متن کامل